Supporting Data Integrity

Let's understand the downside of using the EAV database design in detail.

When we use EAV, we sacrifice many advantages that a conventional database design would have given us.

We can’t make mandatory attributes#

Returning to the example stated above, to should also require that the date_reported attribute has a value. In a conventional database design, it would be simple to enforce a mandatory column by declaring the column NOT NULL.

In the EAV design, each attribute corresponds to a row in the IssueAttributes table, not a column. You would need a constraint that checks that a row exists for each issue_id value, and the row must have the string date_reported in its attr_name column.

However, SQL doesn’t support a constraint that can do this. So, you must write the application code to enforce it. If you do find a bug with no reported date, should you add a value for this attribute? What value should you give it? If you make a guess or use some default value for a missing attribute, how does that affect the accuracy of your boss’s reports?

You can’t use SQL data types#

Your boss tells you that they are having trouble running their report because people have entered dates in different formats or sometimes even strings that aren’t dates. In a conventional database, you can prevent this if you declare the column with the DATE data type.

Inserting invalid data in the field (after changing the data type to DATE)

Try it yourself in the following playground.

Retrieving data after inserting invalid data in the field

In the EAV design, the data type of the IssueAttributes.attr_value column is typically a string to accommodate all possible attributes in a single column. So, it has no way of rejecting invalid data.

Inserting invalid data in the attr_value in EAV design

Try it yourself in the following playground.

Inserting invalid data in the attr_value in EAV design

Some people try to extend the EAV design by defining a separate attr_value column for each SQL data type, leaving null in the unused columns. Doing so would allow you to use data types but it would make queries even worse:

Defining a separate attr_value column for each SQL data type

You would need to add even more columns to support user-defined data types or domains.

Enforced referential integrity#

In a conventional database, we can restrict the range of some attributes by defining a foreign key to a lookup table. For example, the status attribute of a bug or issue should be one of a small list of values stored in the BugStatus table.

Creating a table having status as a foreign key

In the EAV design, we can’t apply this kind of constraint on the attr_value column. A referential integrity constraint applies to every row in the table.

If we define this constraint, it will force every attribute to match a value in BugStatus, not just the status attribute.

Inserting data in IssueAttributes table

Let’s try it in the following playground. We will add the given code above and see how it works.

Creating table having status as foreign key and inserting data

We will see an error, which occurs as a result of using a column ofBugStatus as a foreign key. It would need to have the same data available in the column status of the BugStatus table to be entered in the attr_value column.

You can’t make up attribute names#

Your boss’ reports are still not reliable. You find that attributes are not being named consistently. One bug uses an attribute named by the string date_ reported, but another bug names the attribute by the string report_date. Both are clearly intended to represent the same information.

How would you count the number of bugs per date?

Run the code in the following playground.

Counting the bugs per date

How would you know if a given bug hasn’t stored an attribute by yet another name? How would you know if a given bug has stored a given attribute twice by two different names? How can you prevent such mistakes?

One remedy may be to declare a foreign key on the attr_name column that links to a lookup table that contains your approved attribute names. However, this doesn’t support attributes you define on the fly for each entity, even though that’s a common practice when using the EAV design.

Antipattern: Use a Generic Attribute Table
Solution: Model the Subtypes Using Inheritance
Mark as Completed
Report an Issue